In Proceedings of LREC-2002 Workshop Data Collection and Language Technologies for Mapudungun

نویسندگان

  • Lori Levin
  • Rodolfo Vega
  • Alon Lavie
  • Eliseo Cañulef
  • Carolina Huenchullan
چکیده

Mapudungun is spoken by over 900,000 people (Mapuche) in Chile and Argentina. Thanks to an active bilingual and multicultural education program, Mapuche children are now being taught to be literate in both Mapudungun and Spanish. The Chilean Ministry of Education has teamed up with the Language Technologies Institute’s AVENUE project to collect data and produce language technologies that support bilingual education. The main resource that has come out of the Mineduc-LTI partnership is Mapudungun-Spanish parallel corpus consisting of approximately 200,000 words of text and 120 hours of transcribed speech. Plans are being made for machine translation and computer-assisted instruction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Collection and Language Technologies for Mapudungun

Mapudungun is spoken by over 900,000 people (Mapuche) in Chile and Argentina. Thanks to an active bilingual and multicultural education program, Mapuche children are now being taught to be literate in both Mapudungun and Spanish. The Chilean Ministry of Education has teamed up with the Language Technologies Institute’s AVENUE project to collect data and produce language technologies that suppor...

متن کامل

Data Collection and Analysis of Mapudungun Morphology for Spelling Correction

This paper describes part of a three year collaboration between Carnegie Mellon University's Language Technologies Institute, the Programa de Educación Intercultural Bilingüe of the Chilean Ministry of Education, and Universidad de La Frontera (Temuco, Chile). We are currently constructing a spelling checker for Mapudungun, a polysynthetic language spoken by the Mapuche people in Chile and Arge...

متن کامل

Message from the Program Chair

The Third NTCIR Workshop is the third venture in a series of evaluation workshops designed to enhance research in information access technologies including text retrieval, cross language information retrieval, automatic text summarization, information extraction, and question answering on Japanese and Asian language text. The goals of the NTCIR Workshops are as follows: 1. to encourage research...

متن کامل

Building NLP Systems for Two Resource-Scarce Indigenous Languages: Mapudungun and Quechua

By adopting a “first-things-first” approach we overcome a number of challenges inherent in developing NLP Systems for resourcescarce languages. By first gathering the necessary corpora and lexicons we are then enabled to build, for Mapudungun, a spellingcorrector, morphological analyzer, and two Mapudungun-Spanish machine translation systems; and for Quechua, a morphological analyzer as well as...

متن کامل

Towards an International Standard on Feature Structure Representation

This paper describes the preliminary results of a joint initiative of the TEI (Text Encoding Initiative) Consortium and the ISO Committee TC 37SC 4 (Language Resource management) to provide a standard for the representation and interchange of feature structures. The paper published in the proceedings of this workshop is in fact an extension of a paper published in the LREC 2004 proceedings, and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002